Search CORE

5 research outputs found

Emotion Recognition from Acted and Spontaneous Speech

Author: Atassi Hicham
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2014
Field of study

Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

Digital library of Brno University of Technology

National Repository of Grey Literature

Emotional vocal expressions recognition using the COST 2102 Italian database of emotional speech

Author: Atassi Hicham
Esposito Anna
Hussain Amir
Riviello Maria Teresa
Smekal Zdenek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The present paper proposes a new speaker-independent approach to the classification of emotional vocal expressions by using the COST 2102 Italian database of emotional speech. The audio records extracted from video clips of Italian movies possess a certain degree of spontaneity and are either noisy or slightly degraded by an interruption making the collected stimuli more realistic in comparison with available emotional databases containing utterances recorded under studio conditions. The audio stimuli represent 6 basic emotional states: happiness, sarcasm/irony, fear, anger, surprise, and sadness. For these more realistic conditions, and using a speaker independent approach, the proposed system is able to classify the emotions under examination with 60.7% accuracy by using a hierarchical structure consisting of a Perceptron and fifteen Gaussian Mixture Models (GMM) trained to distinguish within each pair (couple) of emotions under examination. The best features in terms of high discriminative power were selected by using the Sequential Floating Forward Selection (SFFS) algorithm among a large number of spectral, prosodic and voice quality features. The results were compared with the subjective evaluation of the stimuli provided by human subjects

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

A novel clinical expert system for chest pain risk assessment

Author: Atassi Hicham
Eckl Chris
Farooq Kamran
Hussain Amir
Leslie Stephen
MacRae Calum
Slack Warner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Rapid access chest pain clinics (RACPC) enable clinical risk assessment, investigation and arrangement of a treatment plan for chest pain patients without a long waiting list. RACPC Clinicians often experience difficulties in the diagnosis of chest pain due to the inherent complexity of the clinical process and lack of comprehensive automated diagnostic tools. To date, various risk assessment models have been proposed, inspired by the National Institute of Clinical Excellence (NICE) guidelines to provide clinical decision support mechanism in chest pain diagnosis. The aim of this study is to help improve the performance of RACPC, specifically from the clinical decision support perspective. The study cohort comprises of 632 patients suspected of cardiac chest pain. A retrospective data analysis of the clinical studies evaluating 14 risk factors for chest pain patients was performed for the development of RACPC specific risk assessment models to distinguish between cardiac and non cardiac chest pain. In the first phase, a novel binary classification model was developed using a Decision Tree algorithm in conjunction with forward and backward selection wrapping techniques. Secondly, a logistic regression model was trained using all of the given variables combined with forward and backward feature selection techniques to identify the most significant features. The new models have resulted in very good predictive power, demonstrating general performance improvement compared to a state-of-the-art prediction model

Stirling Online Research Repository (RIOXX)

Stirling Online Research Repository

A Novel Cardiovascular Decision Support Framework for Effective Clinical Risk Assessment

Author: Atassi Hicham
Farooq Kamran
Hussain Amir
Karasek Jan
Luo Bin
MacRae Calum
Mahmud Mufti
Slack Warner
Yang Peipei
Publication venue
Publication date: 01/01/2014
Field of study

Institutional Repository Universiteit Antwerpen

Automatic Processing Pipeline for Collecting and Annotating Air-Traffic Voice Communication Data

Author: Alexander Blatt
Allan Tart
Amrutha Prasad
Chloe Salamin
Claudia Cevenini
Dietrich Klakow
Fabian Landis
Hicham Atassi
Igor Szöke
Iuliia Nigmatulina
Jan Černocký
Juan Zuluaga-Gomez
Karel Veselý
Khalid Choukri
Martin Kocour
Mickael Rigault
Pavel Kolčárek
Petr Motlíček
Saeed Sarfjoo
Santosh Kesiraju
Publication venue: 'MDPI AG'
Publication date: 31/12/2021
Field of study

This document describes our pipeline for automatic processing of ATCO pilot audio communication we developed as part of the ATCO2 project. So far, we collected two thousand hours of audio recordings that we either preprocessed for the transcribers or used for semi-supervised training. Both methods of using the collected data can further improve our pipeline by retraining our models. The proposed automatic processing pipeline is a cascade of many standalone components: (a) segmentation, (b) volume control, (c) signal-to-noise ratio filtering, (d) diarization, (e) ‘speech-to-text’ (ASR) module, (f) English language detection, (g) call-sign code recognition, (h) ATCO—pilot classification and (i) highlighting commands and values. The key component of the pipeline is a speech-to-text transcription system that has to be trained with real-world ATC data; otherwise, the performance is poor. In order to further improve speech-to-text performance, we apply both semi-supervised training with our recordings and the contextual adaptation that uses a list of plausible callsigns from surveillance data as auxiliary information. Downstream NLP/NLU tasks are important from an application point of view. These application tasks need accurate models operating on top of the real speech-to-text output; thus, there is a need for more data too. Creating ATC data is the main aspiration of the ATCO2 project. At the end of the project, the data will be packaged and distributed by ELDA

Multidisciplinary Digital Publishing Institute